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Abstract 

Acyclic directed mixed graphs (ADMGs) are graphs that contain directed 
(— >) and bidirected (<H>) edges, subject to the constraint that there are no cy- 
cles of directed edges. Such graphs may be used to represent the conditional 
independence structure induced by a DAG model containing hidden variables 
on its observed margin. The Markovian model associated with an ADMG is 
simply the set of distributions obeying the global Markov property, given via 
a simple path criterion (m-separation). We first present a factorization cri- 
terion characterizing the Markovian model that generalizes the well-known 
recursive factorization for DAGs. For the case of finite discrete random 
variables, we also provide a parametrization of the model in terms of sim- 
ple conditional probabilities, and characterize its variation dependence. We 
show that the induced models are smooth. Consequently Markovian ADMG 
models for discrete variables are curved exponential families of distributions. 



1 Introduction 



A directed graph is a finite collection of vertices, V, together with a collection 
of ordered pairs E C V x V such that (v,v) ^ E for any v; if (v,w) € E 
we write v — > w. E is the (directed) edge set. We say a directed graph is 
acyclic if it contains no directed cycles; that is, there is no sequence of vertices 
v\ — >• V2 — > • • • — >• v k — > v\ , for any k > 1. We call such a graph a directed 
acyclic graph (DAG). DAGs are popular because of their simple definition in 
terms of a recursive factorization, easy to determine conditional i ndependence 
constraints, and potential for causal inte rpretations ( Spirtes et al. . 1993 : Pearl 
19951 . 120091 : iRobins and Richardsonl . l201lh . Unfortunately, if some of the variables 
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Figure 1: An acyclic directed mixed graph, Q\. 



in a DAG are unobserved, the resulting pattern of conditional independences no 
longer corresponds to a DAG model (on the observed variables); in this sense, 
DAGs are not closed under marginalization. 

An acyclic directed mixed graph (ADMG) consists of a DAG with vertices V and 
edges E, together with a collection B of unordered (distinct) pairs of elements of 
V. If {v, w} G 6 we write v o w, and if in addition (v, w) £ E this is denoted 
«f1w. An example of an ADMG is given in Figure [TJ 

Like DAGs, acyclic directed mixed graphs can be interpreted, via a Markov prop- 
erty, as representing a set of probability distributions defined by conditional in- 
dependence restrictions; these can be read off the graph using a graphical sepa- 
ration criterion. The advantage that ADMGs have is that they are closed u nder 
marginalization, in the sense mentioned above ([Richardson and Spirte in- 
deed they represent precisely the c onditional indepen dence relations which can be 
obtained by marginalizing DAGs. Richardson (2003) gave a global Markov prop- 
erty and ordered local Markov property for ADMG models, and showed their 
equivalence. 

Evans and Richardson ( 20131 ) provide a number of applied examples, and discuss 
the relation between Markoyian ADMG models a nd marginal log- linear models 
(IBergsma and Rudasl . 2002; iBartolucci et all 120071 ) . ADMGs also arise in study- 
ing general conditions for identifyi ng intervention distributions, under the causa l 
interpretation of a DAG m o del (see Pearl and Robins! ll995l:lT ian and Pearl |2002| : 
Shpitser and Pearll . l2006al lbl; iHuang and Vaftortal . l2006l ; lDawid and Didelej . bold ) . 



ADMGs may also be y iewed as a subclass of the lar ger classes of summary 
graphs (IWermuthl. 120111 ) and ribbonless mixed graphs (ISadeghi and Lauritzenl . 
2OI1I : ISadeghil . l2012l ~ which also allow undirected edges. The factorization and 
parametrization developed here may be extended to these larger classes without 
difficulty. 

The remainder of the paper is organized as follows: Section [2] introduces basic 
graphical concepts. In Section [3] we give conditions under which a partial ordering 
on a class of subsets may be used to define partitions of arbitrary subsets. In 
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Section [J] we use these tools to develop our factorization criterion, which then 
forms the basis of the simple parametrization introduced in Section In Section 
[6] we show that the Markov model associated with an ADMG is smooth, and 
characterize the variation dependence of the parametrization. Finally, Section [7] 
contains a brief discussion. 

2 Graphical Definitions and Markov Properties 

Let Q be an acyclic directed mixed graph with vertices V; the induced subgraph 
of Q over A C V, denoted Qa, is the graph with vertex set A, and all those edges 
which join two vertices that are both in A. 

A path in Q is a sequence of adjacent edges, without repetition of a vertex; a 
path may be empty, or equivalently consist of only one vertex. The first and last 
vertices on a path are the endpoints (these are not distinct if the path is empty); 
other vertices on the path are non- endpoints. A directed path is one in which 
all the edges are directed (— >■) and are oriented in the same direction, whereas a 
bidirected path consists entirely of bidirected edges. 

We use the usual familial terminology for vertices in a graph. If w — > v we say 
that w is a parent of v; the set of parents of v is denoted p&g(v). In addition, w 
is an ancestor of v if v = w or if there is a directed path from w to v; conversely 
v is a descendant of w. A set of vertices A is ancestral if A = ang(A); that is, A 
contains all its own ancestors. 

The district containing v is the set of vertices w such that v <-> ■ ■ ■ w, including 
v itself. The ancestors, descendants and district of v are denoted ang(v), deg(v) 
and disg(v) respectively. We apply these functions disjunctively to sets so that, 
for example, 

P&g(W) = \J P&g(v). 

We will also use the notation disyi(v) as a shorthand for disg A (w), the district 
containing v in the induced subgraph of Q on A. 

For an ADMG Q with vertex set V, we consider collections of random variables 
(X v ) ve y taking values in probability spaces (X t) ) t)g y; these spaces are either finite 
discrete sets or finite-dimensional real vector spaces. For A C V we let 3La = 
x v£ a(%v), 3£ = Xy and Xa = (X v ) v€ a- We abuse notation in the usual way: v 
denotes both a vertex and the random variable X v , likewise A denotes both a set 
of vertices and the random vector Xa- For fixed elements of X v and Xa we write 
x v and xa respectively. 
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The relationship between a graph Q and random variables Xy is governed by 
Markov properties specified in terms of paths. 

A non-endpoint vertex c on a path it, is a collider on pi if the edges preceding and 
succeeding c on the path have an arrowhead at c, for example — >■ c or O c ; 
otherwise c is a non- collider. 

Definition 2.1. A path 7r in Q between two vertices v,w £ ^((?) is said to be 
blocked by a set C C V \ {f, u>} if either: 

(i) there is a non-collider on 7r and in C; or 

(ii) there is a collider on ir which is not in ang(C). 

We say v and w are m-separated given C in Q if every path from v to w in £/ 
is blocked by C. Note that C may be empty. Sets A, B C V are said to be 
m-separated given C C V \ (.A U B) if every pair a € A and b £ B are m-separated 
given C. 



The special case of m- separation for DAGs is the better known d-separation ([Pearl 



19881 : lLauriteenlll996h . We next relate m-sep aration t o con ditional independence, 



for which we use the now standard notation of Dawid ( 19791 ): for random variables 



X, Y and Z we denote the statement l X is independent of Y conditional on Z' 
by X X Y | Z. If Z is empty we write X X Y. 



Definition 2.2. A probability measure P on X is said to satisfy the global Markov 
property (GMP) for an acyclic directed mixed graph Q, if for all disjoint sets 
A,B,C C V with A and B non-empty, A being m-separated from B given C 
implies that Xa JL Xb \ Xc under P. 

Consider the ADMG Q\ in Figure [TJ the vertices 1 and 4 are m-separated con- 
ditional on 2, and 1 and 3 are m-separated unconditionally. It is not hard to 
verify that no other m-separation relations hold for this graph, and that therefore 
a distribution P obeys the global Markov property with respect to Q if and only 
if Xi X X 4 | X 2 and X x X X 3 under P. 

Definition 2.3. Let Q be an ADMG with a vertex v, and an ancestral set A such 
that v € barren^ (^4) . Define 

mb(v,A) = pa g (dis A (v)) U (dis^f/v) \ {v}) 

to be the Markov blanket for v in the induced subgraph on A. 

Let < be a topological ordering on the vertices of Q, meaning that no vertex 
appears before any of its ancestors; let pre^ < (v) be the set of vertices preceding 
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v in the ordering. A probability measure P is said to satisfy the ordered local 
Markov property for Q with respect to <, if for any v and ancestral set A such 
that v G A C preg <(v), 

(mb(u, A) U {f }) | mb(v, A) 

with respect to P. 



Proposition 2.4 (jRichardsonl ((2003), Theorem 2). Let Q be an ADMG, and < a 



topological ordering of its vertices; further let P be a probability measure on £y. 
The following are equivalent: 

(i) P obeys the global Markov property with respect to Q; 

(ii) P obeys the ordered local Markov property with respect to Q and <. 

In particular this result implies that if the ordered local Markov property is sat- 
isfied for some topological ordering <, then it is satisfied for all such orderings. 
For the ADMG Gi, for example, we can use the topological ordering 1, 2, 3, 4, 



3 Partitions and Partial Orderings 

The global Markov property for DAGs can be equivalently stated in terms of a 
simple factorization criterion applied to the joint distribution. In order to achieve 
something similar for ADMGs, we will need to consider partitions of sets of vertices 
into appropriate blocks. This section develops the necessary mathematical theory 
on partition functions. 

Let V be an arbitrary finite set, and let H be a collection of non-empty subsets 
of V, with the restriction that {v} G % for all v G V (i.e. all singletons are in 7i). 
Let -< be a partial ordering on the elements of 7i, and write Hi X H 2 to mean 
that either Hi -< H2 or Hi = H 2 . 

Definition 3.1. We say that -< is partition- suitable (for T~i) if for any H\^Hi € H 
with Hi fl H 2 ^ 0, there exists H* G % such that H* C Hi U H 2 and Hi < H* 
for each i = 1,2. 

In other words partition-suitability ensures that any two intersecting elements of 
Ti are dominated with respect to -< by some element of Ti. 

Define a function <3? on subsets of V such that $(W) 'picks out' the ^-maximal 
elements of 7i which are subsets of W. That is, 

$(W) = {H G H \ H C W and H ^ H' for all other if C W}. 
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Proposition 3.2. If ~< is partition- suitable and H 1 ,H 2 G $(A) for some set A, 
then HiC\H 2 = $ 

Proof. This is immediate from the definition of partition-suitable. □ 

Lemma 3.3. Let -< be partition- suitable, A C V and H G &(A) be a set removed 
from A at the first stage of the partition. If H C B C A for some subset B, then 
H G $(£). 

Proof. Let 7^ be the set subsets in % which are contained within A. If H G 
$(^4) C %a then is maximal with respect to -< in 'Ha- It is trivial that 
7~Lb Q Ha, and so H is also maximal in He- Thus H G □ 

Now let 

1>(W) = W\ |J c, 
Ce*(w) 

i.e. V' returns those elements of which are not contained in any set in <&{W). 
Then recursively define a partition function [■] on subsets of V by [0] = 0, and 

[W] = $(W) U [^(W)]. 

The idea is that the function $ 'removes' some maximal sets from W, and the 
procedure is then applied again to ip(W). The following proposition shows that 
each vertex of W is contained within precisely one set in \W\ . 

Proposition 3.4. If -< is partition- suitable then the function [•] partitions sets. 
That is, for any W C V , 

\J H = W, 

He[W] 

and if A,B G [W] t/ien either A = B or An B = ®. 

Proof. We proceed by induction on the size of W. If W = the result follows 
from the definition. Also by definition, if W ^ then 

[W] = $(W) U [tp(W)], 

so the induction hypothesis and the definitions of $ and ip mean we need only 
prove that 3>(W) is non-empty and contains disjoint sets. 

-< is a partial ordering, so it always contains at least one maximal element (since 
V is finite). Further $ takes only maximal sets, it therefore follows from the 
definition of partition-suitable that these sets are disjoint. □ 
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The next proposition shows that partition functions as we have defined them are 
stable when some set in the partition is removed. 

Proposition 3.5. If C E [W], then [W] = {C} U[W\C}. 

Proof. We proceed by induction on the size of W. If \W\ = {C}, including any 
case in which \W\ = 1, the result is trivial. 

If C is not maximal with respect to -< in W, then <3?(VF) = Q(W \ C), and so 

[W] = $(W) U [ip{W)] 

= $(W\C) U [ip{W)}, 

and the problem reduces to showing that 

mw)} = {c} u mw \ c)] = {c} u mw) \ c\ 

Thus, without loss of generality, C G $>(W). 

Now, clearly <&(W\C)L){C} 3 &(W), and if equality holds we are done. Otherwise 
let Ci, . . . , Cfe be the sets in $(W \ C) but not in ®(W). Note that by definition, 
Ci, . . . , Cfc C ip(W). Further, these sets are maximal in W \ C, so by Lemma [331 
they are also maximal in ^{ip{W)). Then the problem reduces to showing that 

WW)] = {d,. ..,C k }U {^{W) \ (d U • • • U C k )}, 

which follows from repeated application of the induction hypothesis. □ 

Lastly we show that if each set in Tl lies within the elements of some partition 
of V, then the partition function can be applied separately to each piece of this 
coarser partition. 

Proposition 3.6. Let Di, . . . , D k be a partition of V , and suppose that every 
H € H is contained within some Di. Let -< be a partition- suitable partial ordering, 
then 

k 

[w] = \J[wnDi). 

i=l 

Proof We prove the case k = 2, from which the general result follows by repeated 
applications. If either of W fl D\ or W R Z?2 are em Pty, then the result is trivial. 
By definitions 

[W] = ®(W) U [ip(W)]; 
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ip(W) is strictly smaller than W, so by the induction hypothesis 

[W] = $(W) u [ip(W) n Di] u [ip(W) n D 2 \. 

§(W) = C\ U C 2 where each H € Ci is a subset of Z?^; since the elements of Ci are 
maximal with respect to -< in W, by Lemma [3.31 thev are also maximal in W(~)Di. 
Hence Ci C &(W n -Dj), and then repeatedly applying Proposition 13,51 gives 

Q u [tp(w) n Di] = [wnDi], 

because (ip(W) D Di) U UceC C = W n A- Hence the result. □ 



4 The Factorization Criterion 

Let P be a probability measure having density fy : — > R with respect to some 
<T- finite dominating measure /i on Xy. For U,W C V, we denote by /iy : Xvi/ — >• 
R the marginal density over W, and by /vk|c/(' I u ) '■ %w R for fu(u) > 
the conditional density of W given U = u (more precisely: any member of the 
equivalence class of such densities). Then P obeys the global Markov property 
with respect to a DAG if and only if it factorizes as 

fy(xy) = Y\ fv\pa(v)( x v I £pa(t>))> 

vev 



for /x-almost all xy G Xy (see, for example, Lauritzen . 19961 ). In the sequel, all 



equalities over / are considered to hold almost everywhere with respect [i. 

In this section we show that factorizations can also be used to characterize Markov 
models over ADMGs; however, as we shall see, the criterion is more complicated 
than that for DAGs. 

Example 4.1. Consider the ADMG in Figure [TJ A distribution which obeys 
the global Markov property with respect to this graph satisfies X\ X A3 and 
X\ X X4 I X 2 . It is not possible to specify a factorization on the joint distribution 
of Xi, X2, A3 and A4 which implies precisely these two independences. Instead, 
we require factorizations of certain marginal distributions: 

fi3(xi, x 3 ) = /i(xi) • 73(2:3), 

fl2i(xi, X 2 , Xi) = fl(xi) ■ f 2 \l( x 2 I Xi) ■ /4| 2 (^4 I X 2 ). 

In this section we will see how such marginal factorizations can be used to rep- 
resent distributions which obey the global Markov property with respect to an 
ADMG. 
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Definition 4.2 (Head). A vertex set H C V is a /lead if it is barren in Q and i7 
is contained within a single district of G^m)- We write Ti(G) for the collection of 
all heads in Q. 

Example 4.3. For the ADMG shown in Figure [2](b) we have the following: 

nO) = {{0}, {1}, {2}, {3}, {4}, {0, 1}, {0, 2}, {1, 4}, {2, 3}, 

{0,1, 2}, {0,1,4}, {0,2,3}, {0,3, 4}} 

Notice that although they are contained within a single district, the sets {0, 1, 2, 4}, 
{0, 1, 2, 3} and {0, 1, 2, 3, 4} do not form heads because they are not barren. Also 
observe that {0, 3, 4} does form a head, even though the induced subgraph £{0,3,4} 
is not connected (because {0, 3, 4} is a subset of a single district in Q ang ({0,3,4}) as 
required). 

Definition 4.4 (Tail). For any head H, the tail of H is the set 

tail 6 (tf) = (dis an(H) (#) \H) Upa(dis an(H) (H)). 

We denote the first set in this union by dis-tailg(i^), and the second by pa-tailg(i/); 
these sets need not be disjoint. If the context makes it clear which head we are 
referring to, we will sometimes denote a tail simply by T. 



Note that the tail is a subset of the ancestors of the head. An intuitive interpre- 
tation of heads and tails is that a head H is a set within which no independence 
relations hold without marginalizing some elements of H, and the tail is the 
Markov blanket for H within the set w.g{H). We will see, therefore, that we can 
factorize ancestral sets into heads conditional upon their tail sets. 

Example 4.5. In the special case of a DAG, the heads are all singleton vertices 
{v}, and the tails are the sets of parents pag(v). In a purely bidirected graph, 
the heads are just the connected sets, and the tails are all empty. 

Example 4.6. The graph Q\ in Figure [1] has the following head-tail pairs: 



H 


{1} {2} {3} 


{2,3} 


{4} 


{3,4} 


T 


{1} 


{1} 


{2} 


{1,2} 



Note that the set {2, 3, 4} is not a head, because it is not barren. 



In general, it is not possible to order the vertices in an acyclic directed mixed graph 
such that, for each head H, all the vertices in pa^(-ff) precede all th e vertices in 
H. A counter example is given in Figure [2](a), which is taken from iRichardson 



9 



(a) (b) 

Figure 2: (a) An ADMG in which there is no vertex ordering such that all parents 
of a head precede every vertex in the head; (b) {0, 3, 4} forms a head in this 
ADMG, but the induced subgraph on {0, 3, 4} is not connected. 

(200£)|). The head {1,4} has parent 2, and whilst the head {2,3} has parent 1; 
whichever way we order the vertices 1 and 2, the condition will be violated. 

However, there is a well-defined partial ordering on heads which will be useful to 
us, and satisfy the essential property of partition-suitability from Section [3j 

Definition 4.7. For two distinct heads Hi and Hj in an ADMG Q, say that 
Hi^HjiiHiC&ngtHj). 

Lemma 4.8. The (strict) partial ordering -< is well-defined. 

Proof. We need to verify that -< is irreflexive, asymmetric and transitive; irreflex- 
ivity is by definition. Asymmetry amounts to Hi -< Hj ==> Hj -ft Hi; suppose 
not for contradiction, so that there exist distinct heads Hi and Hj with Hi -< Hj 
and Hj -< Hi. Since Hi and Hj are distinct, there exists a vertex v which is 
in one of these heads but not the other; assume with out loss of generality that 
v 6 Hj \ Hi. 

Since Hj C ang (Hi), we can find a directed path tt\ from v to some vertex w £ Hi; 
the path is non-empty because v ^ Hi. However, since we also have Hi C ang (Hj), 
we can find a (possibly empty) directed path TT2 from w to some x € Hj. Now, the 
concatenation of iri and 7T2 is also a path, because any repeated vertices would 
imply a directed cycle in the graph. Call this new path tt. 

But tt is a non-empty directed path between two vertices in Hj, which violates 
the requirement that heads are barren. Hence asymmetry holds. 
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For transitivity, if Hi -< Hj and Hj -< H^, then clearly we can find a directed 
path from any element v € Hi to some element of H^, simply by concatenating 
paths from v £ Hi to some u> € ffj and from u; to H^. Hence Hi C ang(i?j.), and 
so H ff fe . □ 

Lemma 4.9. TTie partial ordering ~< on the heads of an ADMGTl(Q) is partition- 
suitable. 

Proof sketch; see Appendix for details. If two heads Hi, H2 Q W are distinct and 
H 1 r\H 2 ^ 0, then H* = barren g (Fi U H 2 ) is a head, H x -< H* and H 2 -< H*. □ 

Note that in general H* may be a strict subset of H\ U H 2 . For example consider 
the graph shown in Figure [2(b). Let = {0,1,2,3,4}, fli = {0,1,4} and 
H 2 = {0,2,3}. Now H U H 2 G H{Q) and H x n H 2 = {0}. However, H* = 
barren g (ifi U £T 2 ) = {0, 3, 4}QH 1 UH 2 . 

Denote the relevant functions from Section [3] defined by this partial ordering 
by <3?g, tpg and [-]g respectively. This partition function allows us to factorize 
probabilities for ADMGs into expressions based upon heads and tails. 

Example 4.10. For the graph Q\ in Figured! we have 



H 


{1} 


{2} 


{3} 


{2,3} 


{4} {3,4} 


ang(H) 


{1} 


{1,2} 


{3} 


{1,2,3} 


{1,2,4} {1,2,3,4} 



so that 

{1} ~< {2} -< {2,3} -< {3,4}, 

{2} ■< {4} -< {3,4}, {3} ^{2,3}. 

Then, for example, $^({2,3,4}) = {{3,4}}, and $ gi ({2}) = {{2}}, giving 

[{2,3,4}] 5l ={{3,4},{2}}. 

Example 4.11. For the graph in Figure [2(a), we have 



H 


{1} 


{2} 


{1,4} 


{2,3} 


img(H) 


{1} 


{2} 


{1,2,4} 


{1,2,3} 



Thus {1} -< {1,4}, and {2} -< {2,3}. 

Now we can provide a factorization criterion for acyclic directed mixed graphs. 
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Theorem 4.12. Let Q be an ADMG, and P a probability distribution on Xy with 
density fy. Then P obeys the global Markov property with respect to Q if and only 
if for every ancestral set A £ A(G), and ^-almost all xa 6 Xa- 

fA{xA)= I| fH\T(x H \x T ). (1) 
He[A] g 

A formal proof of this result is given in the Appendix; a sketch proof is given in 
iRichardsonl ifcood ). Theorem 4. 

Example 4.13. For the graph in Figure El^a) , observe that the global Markov 
property implies precisely that X% X X4 | X\ 2 , and X\ II2. Theorem 14. 121 gives 
us the following factorization of the density: 

/1234O1, x 2 , x 3 , x 4 ) = f 2 3\i{x 2 , x 3 I xi) ■ /i4| 2 (a;i 3 x A \ x 2 ) 

for all Xi € Xi, i = 1, . . . , 4. Though it may appear slightly strange, since the first 
term gives density for {X 2 ,X%} given X\, while the second is for {X\,X{\ given 
X 2 , this factorization does indeed imply that X3 X X4 \ X\ 2 . Further, integrating 
out X3 and X4 gives 

fi 2 (x 1 , x 2 ) = f 2 \i(x 2 I Xl) ■ fi\ 2 (xi I X 2 ), XlGXl, i 2 E4 

which implies that X\ X X 2 . 

Remark 4.14. It follows from Theorem 14,121 that if H is a head, tailg(-ff) is 
the Markov blanket for H in the set ang(H), in the sense that under the global 
Markov property, 

H X ang(H) \(HU tailg(H)) | tailg(H). (2) 

Remark 4. 15. A different, i ncorr ect definition of <&c (and the refore ipg, [-]g) 
was given in iRichardson (|20nfll ) and lEvans and Richardson! The incorrect 



definition coincides with that given here when W is ancestral, so Equation ([T]) 
holds for both. However, Equation (J3]) below does not hold for the incorrect 
partition in general. 



5 Towards a Parametrization of the Discrete Markov 
Model for an ADMG 

The factorizations in Theorem 14.121 can be used to produce a parametrization of 
ADMG models when Xy is a finite set, and thus the relevant random variables are 
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discrete. For simplicity of exposition we will hence forth assume that the random 
variables are binary, and thus that Xy = {0, 1}'^'; extension to the general finite 
discrete case is easy but notationally challenging. 



Theorem 5.1. Let Q be an ADMG, and P a probability distribution on {0, ljl^L 
Then P obeys the global Markov property with respect to Q if and only if for any 
ancestral set A 

P(X A = x A )= ^ (_i)|c\o| "Q p { x H = 0\X T = x T ), (3) 
c-.occca He[c] g 

where O = {v G A \ x v = 0} and the empty product is taken to be 1. 



Note that the sets C in (|3|) may not be ancestral, which hinders proof by induction. 
Theorem 15.11 suggests that conditional probabilities of the form P(Xh = | Xt = 
xt) form a parametrization of the binary ADMG model; this is proved in Section 

El 



The following result, due to Evans and Richardson ( 2010I ). shows that the 
mation in ([3]) can be factorized into districts. 



sum- 



Lemma 5.2. Suppose D\\JD2VJ- ■ - UDk = D and that each pair D{ andDj, i ^ j , 
there are no bidirected paths from Di to Dj in G an (G) ■ Further let Oi = O n D{ 
for each i. Then 

E (-1) |CV °' II P(Xh = 0\X t = x t ) 

c-.occcd He[c] g 

k 

=n e (-i) |cv?ii n p(x H =o\x T =x T ). (4) 

i=l C.OiCCCDi He[C]g 

The induction argument we will use in the proof of Theorem 15.11 requires the 
following definition and lemma. 

Definition 5.3. Let Q be an ADMG, and W be a subset of its vertices. We say 
W is an ancestrally closed district for Q if W is a district in angiW). 



Lemma 5.4. If P obeys the global Markov property with respect to Q, then for 
every ancestrally closed district D in Q and v £ barreng(D), 

£ II P(X H = x H \X T = x T )= J] P{X H = x H \X T = x T ). (5) 

Xv H£[D]g He[D\{v}]g 
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Proof. Suppose that the GMP is satisfied, and let A = ang(D). Then 

H P(Xh>=xh>\X t >=x ti )J2 II P(Xh = x h \Xt = x t ) 

H'e[A\D]g x v H€[D]g 

= J2 II P(Xh = x h \Xt = x t ) n P(X H ,=x H ,\X T ,=x T ,) 

xv He[D]g H'e[A\D] g 

= J2 II P{Xh = x h \X t = x t ) 

Xv H£[A]g 

by Proposition 13.61 Then by Theorem 14.121 
= Y^P(X A = x A ) 

Xy 

= P( X A\{v} = Xa\{v}), 

which, since A \ {v} is ancestral, is 

= J] P(X H = x H \X T = x t ) 

He[A\{v}]g 

= H P(X H = X H I X T = X T ) P(X H , =X H >\X T , =X T >). 

H£[D\{v}]g H'e[A\D]g 

Then comparing the first and last expressions in this sequence gives ([5]). □ 

Proof of Theorem \5.1{ Suppose that P obeys the factorization in ([T]). We will 
show that for any disjoint union of ancestrally closed districts D, 

11 P(X H = x H \X T = x T )= (-1) |CV?I II P(Xh = 0\X t = xt) 

HG[D]g OCCCD He[C]g 

where O = {v G D \ x v = 0}. Since ancestral sets are also disjoint unions of 
ancestrally closed districts, this gives the 'only if part of the statement. We 
proceed by induction on the size of D and the number of Is in the vector xr>- If 
xd = then the result is trivial, since the left and right hand sides are identical; 
if \D\ = 1 then this is just a trivial application of the laws of probability. 

Suppose that xr> ^ and \D\ > 1, and let D = D\\J- ■ -UD/j be disjoint ancestrally 
closed districts D\, . . . , D^. 

If x v = for all v € barreng(Di), then there is some head H C barreng(Di) 
which, by Lemma 13.31 appears in [C]g for all O C C C D; this means that we can 
remove the factor P(Xh = | Xt = xt) from both sides of the above expression, 
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and the problem is reduced to a strictly smaller disjoint union of ancestrally closed 
districts, D\H. 

Otherwise, let x v = 1 for v € barren^-Dx); then 
£ (_l)l<AOil Yl P(X H = 0\X T = x T ) 

C.OiCCCD! He[C]g 

E (-l)'^ I] P(Xh = 0\X t = x t ) 

C:OiCCCDi\{«} He[C]g 

e (-i) |cv ° iuw)i n ^=oix T =x T ) 

= J] P(X h = x h \X t = xt)- n P(X H = x' H |X T = x T ), 
He[r>i\{«}] He[ih] 

where x' = x except that x' v = 0; this last expression follows from the induction 
hypothesis applied to the first term because \D\ \ {v}\ < \D\\, and the second 
because x' Di has strictly fewer Is than xdi- By Lemma 15.41 this is just 

= Yl P(X H = x H \X T = x T ). 
He[Di] 

Now, using Lemma 15.21 

^(-1)I°\°IJJ P(X H = 0\X T = x T ) 
c-.occcd He[c] g 

=n e (-i) 1 ^ 1 n p(xh=o\x t =x T ), 

j C-.OjCCCDj He[C]g 

which by the above result for D\ and application of the induction hypothesis to 
D s for s > 1, is 

=n n p{x H =x H \x T =x T ), 

and so Proposition 13.61 gives 

= J] P(X H = x H \X T = x T ). 
He[D] 

For the converse result, suppose that P satisfies the conditions given in (|3j); we 
will show that it also satisfies the ordered local Markov property and therefore 
the global Markov property for Q. 
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Let A be an ancestral set and v € barreng(j4). Suppose further that A = D\ U 
• • • U Dk for disjoint ancestrally closed districts D\, . . . , D^. By Lemma 15.21 we 
have 

P(X A = x A )= Yl ("1) |CA01 II P ( X " = 0\X T = x T ) 

occca He[c] 

=n e (-i) |c ^' n p(x H =o\x T = XT ) 

j ■<>, ( ' i), ne[c] 

k 

= Y[gj(x Dj , x MD]) ), 

3=1 

for some functions gj. Now if v £ D\, say, then x v appears only in the function 
gi because v 6 barreng(j4). But we have 

P(x A = x A ) = gi(x v , x Dl \ {v} , x pa ^)Y[gj(x Dj , Xp^Dj)), 

which shows that 

v A. A \ (Di U pa^pi)) | (£>i \ {v}) U pa g pi). 

Note also that L>i = disA(w), so 

({v} U mb(w, 4)) | mb(v, A) 

where mb(v, A) = (dis A (v) \ {v}) U p&g(dis A (v)), which is just the ordered local 
Markov property for v and A. □ 



6 Model Smoothness 



Let Vg C A21-1 denote the set of strictly positive binary probability distributions 
which obey the global Markov property with respect to an ADMG G, where 
is the strictly positive fc-dimensional probability simplex and n is the number 
of vertices in Q. We call Vg the model defined by Q on a binary state-space. 
In this section such models are shown to be smooth, in the sense that they are 
curved exponential families of distributions, and we prove that the conditional 
probabilities used in Theorem 15.11 constitute a parametrization. 

It follows from Theorem 14.121 that the collection of probabilities of the form 
P(X H = 0\X T = x T ) x T eX T , HeH(g), 
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is sufficient to recover the joint distribution under the model Vg. However, it 
is not immediately clear that each of these probabilities is necessary, or more 
specifically that the map in ([3]) is smooth and of full rank everywhere. 

For brevity we write qh{xt) = P{Xh = j Xt = xt), and the vector of all such 
probabilities by 

q=(q H (x T )\H eH(g),x T eX T ). (6) 

For p € Vg we — in a mild abuse of notation — let q(p) be the vector of the form 
([6]) determined by calculating the appropriate conditional probabilities from p. 
Since this only involves adding and dividing strictly positive numbers, the map q 
is smooth (infinitely differentiable) . Denote the image of q over Vg by Qg. Our 
aim will be to show that the map in ([3]) provides a smooth inverse to q. The first 
result shows that the set of vectors q that are valid parameters in the sense of 
leading to probability distributions via © corresponds to the set Qg of vectors 
of probabilities P(Xh = | Xt = xt) obtained from distributions in the Markov 
model Vg. 

Theorem 6.1. For an ADMG Q, a vector of parameters q is valid (i.e. q £ Qg) 
if and only if for each xy € Xy we have 

p xv {q)= E (-1) |C ^ 1(0)I II 1h(x t )>0, (7) 

C:x v \0)CCCV He[C\g 

where Xy 1 (0) = {v € V \ x v = 0}. 

Remark 6.2. The boundary of the space is the set of q for which p Xv (q) > for 
all xy G Xy and p Xv {q) = for at least one xy € Xy. 

The definition of p Xv (q) in ([7]) is of the same form as expression given for P(Xy = 
xy) in ([3]) and so the result might at first seem trivial; clearly probabilities must 
be non-negative. However, it is not immediately obvious that this condition is 
sufficient for validity of the parameters. If we take some q^ ^ Qg and apply to it 
the non- linear functional form in ([7]) to obtain p(q Jf ), without this result there is 
no apparent reason why p(q^) should not be a valid probability distribution, nor 
indeed a probability distribution in Vg. 

To prove Theorem 16.11 we need the following lemma. 

Lemma 6.3. Let A be an ancestral set in Q, and let xa € %a- Then 

e pw(i)= e (-i) ic ^ i(o)i n 

y v :y A =x A C : x^(0)CCCA Hd[C\g 
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where x yi 1 (0) = {v € A \ x v = 0}. In particular, 

E^V^) = L 

yv 

Proof. If A = V the result is trivial. If not, pick some v € barreng(V') \ A; this is 
possible because if A D barreng(V) then A = V by ancestrality of A. So 

E pw(i) = E E (-i) |c ^ 1(0)l n ^(z/t) 

y yr xA yy\o)cccv He[c ]g 

= E E E (-i) |c ^ 1(0)l n 

= E f E (-D |C ^^ (0)I II Ml*) 
j/a=2;a y v\{v}y >- - 

+ e (-d ica( ^m (o)uw)i n ^(w))- 

9 -( {o! (o)u{,}cccv He[C] a / 

The last equation simply breaks the sum into cases where y v = 1 and y v = 
respectively, which is possible because t> does not appear in any tail sets. The 
first inner sum in the last expression can be further divided into case where C 
contains v, and those where it does not, giving 

E Pyv(*)= E f E (-i) |c ^> (0)l n wivr) 



+ e (~if^ m n 



+ 



E (_1)WW°> U M)I -Q ^ (yT) Y 

y-\ {v} (0)U{v}CCCV He[C]g J 



The second and third terms differ only by a factor of —1, and so cancel leaving 

/ 

E pw{q)= E E (-i) |c ^« (0)l n 

Repeating this until no vertices outside A are left gives 

E PwM= E (-l) |C ^ 1(0)l II **{vr). 
y ^L A y- A \mcQA He[c] g 
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□ 



Proof of Theorem \6.1i The 'only if part of the statement follows from the fact 
that if the parameters are valid, then p Xv (q) = P(Xy = xy), and is therefore 
non-negative. 

For the converse, suppose that the inequalities hold; we will show that we can 
retrieve the parameters simply by calculating the appropriate conditional prob- 
abilities. Lemma 16.31 ensures that ^2 Xy Px v (q) = 1, and that therefore this is a 
valid probability distribution. 

Choose some H* G H(G), withT* = tailg(H*) and A = &ng(H*); also set xh* = 
and pick xt* G {0, 1}I T *L By Lemma [6T3l 

E Pw(<i)= E (-i) |c ^ 1(0)l II vh(vt)- 

y v :y A =x A Va\0)QCCA Hd[C]g 

Now clearly H* G $g(A), so applying Lemma 13.31 and the fact that H* C y^ 1 (0) 
means we can factor out the parameter associated with H*, giving 

= q H *(VT*) E (-1) |C ^ 1(0)I II QHiVT) 
y2\0)CCCA He[C\H*] g 

= q H *{VT*) E (-1) |C ^\** (0)I J] q H (y T ). 

tf^ H „(0)CCCA\ff* He[C]g 

But note that A \ H* is also an ancestral set, and thus using Lemma 16.31 again. 

E ft* («)= E (-i) |c ^- (0)l n «h<vt). 

VV-Va\H*= x A\R* y-l g *(p)CCCA\H* Hd[C\g 

Hence 

— — = q H *{XT*), 

^yv\(A\ H *) Pxv[q) 

and we can recover the original parameters from the probability distribution p 
in the manner we would expect; that p satisfies the global Markov property for 
Q then follows from Theorem 15. 1L Thus p G Vg and q = q(p) G Qg, so the 
parameters are valid. □ 

Theorem 6.4. For an ADMG Q, the model Vg of strictly positive binary probabil- 
ity distributions satisfying the global Markov property with respect to Q is smoothly 
parametrized by q G Qg. 
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Consequently the model Vg is a curved exponential family of dimension 

d= £ |awol= E 2|tail(H)l - 

HeH(G) HeH{G) 

Proof. By Theorem 16.14 the set Qg C M. d is open. The map p(q) : Qg — > Vg 
is multi-linear, and therefore infinitely differ entiable. Its inverse q : Vg — > Qg is 
also infinitely differentiable. 

The composition q o p is the identity function on Qg , and therefore its Jacobian 
is the identity matrix 1^. However, the Jacobian of a composition of differentiable 
functions is the product of the Jacobians, so 

j. dq dp 
dp dq 

But this implies that each of the Jacobians has full rank d, and therefore the map 
q is a smooth parametrization of Vg. □ 



7 Discussion 



We remark that it is easy to extend the results of this paper from the binary case 
to a general finite discrete state-space; we have avoided this only for notational 
simplicity. It is als o a simple matte r to extend the results from ADMGs to the 
summary graphs of IWermuthl (1201 ll ) which incorporate three types of edge: di- 
rected (—>■), undirecte d ( — ) , and dashed ( — ) ; the d ashed edges are equivalent to 
bidirected (o) edges ( Sadeghi and Lauritzen . 201 ll ). The undirected component 
of a su mmary graph can be dealt with using standard methods for undirected 
graphs (jLauritzenl . 
for an ADMG. 



1996), and the remaining inference is done conditionally, 



as 
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A Technical Proofs 



Proof of Lemma \4.9\ Suppose that two heads H\,H<z C W are distinct and H\ n 
H2 ^ 0. We will show that they are dominated by some H* C H\ U H2, so that 
they cannot both be maximal under -< in W. 



20 



Let H* = barreng(ffi UH 2 ). We first claim that H* is a head: clearly it is barren, 
so we need to prove that it is a district in ang (H*). By definition, w.g{H*) D 
Hi U H 2 ; we need to find a bidirected path between any v,w £ £f* C HiU H 2 . 
If v and u) are either both in ifi or both in i?2, then the existence of such a 
path follows from the fact that these are heads. If v £ -Hi and u; S -ff^ then 
construct a bidirected path in ang(Hi) from v to some vertex x £ Hi C\ H 2 , and 
a bidirected path in &ng(H 2 ) from x to ui; these paths can then be concatenated 
into a new path meeting the requirements, shortening the resulting sequence of 
edges if necessary to avoid repetition of vertices. Hence H* is a head. 

Further, H* is clearly in W, and also Hi,H 2 C &ng(H*), so Hi ^ H* for each 
i = l,2. □ 



Proof of Factorization 



Proposition A.l. Let -< and -<' 6e two partition- suitable partial orderings for 
H, such that for every H € H and W QV, H is maximal in W under ~< whenever 
this is so under -<'. Then [-]^ = . 

Proof. We again proceed by induction on the size of W; clearly [{w}]^ = [{u}]^' = 
{{t'}} for all v 6 V . Now take a general WCf, and suppose that H is maximal 
under -<' in then by Proposition! 



[W]* = {H}U[W\H]^ 
= {H}U[W\H}^ 

= {WV 

by applying the induction hypothesis to W \ H and the fact that H £ [W] x 
because it is maximal under -< in W. □ 



Define a weaker partial ordering ~<* on heads in an ADMG by Hi -<* H 2 if and 
only if both H% -< H 2 , and Hi and H 2 are contained in the same district in 
ang{Hi U H 2 ). It is easy to see that -<* is partition-suitable for heads H(G) by 
repeating the proof of Lemma 14.91 Clearly sets which are maximal under -< will 
also be maximal under -<*, so the partitions defined by these two orderings are 
the same by Proposition IA.1L 

Definition A. 2. Let C C V. We say that an ordering < on the vertices of C is 
(C, -<*)- consistent if for any Hi, H 2 G \C\g such that Hi -<* H 2 , we have vi < v 2 
for all vi € Hi, v 2 £ H 2 . 
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Lemma A. 3. Let D\ and D2 be unions of ancestrally closed districts in Q, and 
D\ n D2 = 0. Let <i and <2 be orderings on D\ and D2 (respectively) . If 
fori = 1,2, <j is (Dj, -<*)- consistent then every extension of <\ and <2 to an 
ordering < on D\ U D2 is also a {D\ U D2, -<*)- consistent ordering. 

Proof. Orderings between vertices vi,v 2 G are specified by <j. Further, if 
Ul G Z?i and 1)2 £ D2 then since x>\ and U2 are in different ancestrally closed 
districts it follows from the definition of -<* that v\ and V2 can be ordered in any 
way to achieve a {D\ U D2, ^*) consistent ordering. 

□ 

A total ordering <j on a set -Dj will be said to be topological in Q if no vertex in 
d 6 -Dj precedes any of its ancestors in <7 that are in Dj. 

Lemma A. 4. Let D\ and D2 be disjoint subsets in Q. Let <i and <2 be topo- 
logical orderings on D\ and D2 (respectively). Then there exists an extension of 
<i and <2 to an ordering < on D\ U D2 that is topological. 

Proof. We construct a topological ordering iteratively as follows: Let (di,..., dk-i) 
be the first k — 1 vertices in Z?i U D2 already ordered under <; let = (D\ U 
D2) \ {di, ■ ■ ■ , dk-i} be the set of vertices remaining to be ordered. Further, let 
Qk = {d I d 6 Ek, ang(d) n E\~ = {d}} be those vertices in E^ that have no proper 
ancestors in E^; Qk 7^ since V is finite and Q is acyclic. Finally, if Qk C\D\ ^ 0, 
define dt to be the first element in Qk under <i, otherwise define dk to be the 
first element in Qk under <2- That the ordering is topological follows from the 
definition of Qk- □ 

Lemma A. 5. Let C and C UW be (unions of) ancestrally closed districts, with 
W C barreng(C U W). Then there exists a topological ordering of the vertices in 
C U W which is both (C, -<*) and (C U W, -<*)- consistent. 

Proof. We proceed by induction on the size ofCUVF; if |C U W\ = or 1 then 
the result is trivial. 

If C U W is a non-trivial union of ancestrally closed districts, then we can split it 
into two smaller sets C x U W\ and C 2 U W 2 , where C = Cii)C 2 and W = W1UW2, 
where U indicates a disjoint union. Clearly W{ G barren(Cj U Wj) for each i, so 
using the induction hypothesis, we can find topological orderings <, on the vertices 
of Cj U Wj which are both (Cj U W«, -<*) and (Cj, -<*) consistent. It then follows 
from Lemma lA.4( taking D{ = (Ci U Wi) that there exists a topological ordering 
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< on C U W that extends <i and <2- It further follows from two applications of 
Lemma [Ol that < is both (C, -<*) and (C U W, -^-consistent. 

Assume that CU W is an ancestrally closed district, and let H = barreng(CU W); 
this is clearly a head and maximal under ~<* in CL)W. Further W C i/ so applying 
Proposition 13.51 gives 



since W D C = 0. Now, by the induction hypothesis, we can find a topological 
ordering of C which is both (C \ (H \ W), -<*) and (C, ^*)-consistent (possibly 
C \ (H \ W = C in which case this is trivial) . Then clearly extending this ordering 
so that everything in W comes after everything in C gives an ordering which is 
(C U W, -<;*)-consistent, since H D W is maximal; since W is barren in C U W, 
the ordering is also topological. □ 

Corollary A. 6. If A and Au{w} are ancestral sets, then there exists an ordering 
< which is both (A, -<*) and (A U {w}, -<*)-consistent. 

Proof. Clearly {w} is barren in AL) {w}, and ancestral sets are (unions of) ances- 
trally closed districts. □ 

Note that this Corollary does not generalize to adding two vertices: if A, AL){wi} 
and A U {1^1,^2} are ancestral then there are graphs where there does not exist 
any topological ordering < that is (A, -<*)-, (Au{u>i}, -<*)- and (AL){wi,W2}, -<*)- 



Given a path, tt, and two vertices v,w on ir, the subpath ir(v,w) is the sequence 
of edges which lie between v and w on tt. As with a path, we allow a single vertex 
(and no edges) to be a degenerate case of a subpath. 

Lemma A. 7. Suppose tt is a path from a to b, and is not blocked by C . Then 
every vertex v on tt is contained in ang({a, b} U C). 

Proof. Suppose w is on tt and is an ancestor of neither a nor b. Then on each of 
the subpaths n(a,w) and Tr(w,b), there is at least one edge with an arrowhead 
pointing towards w along the subpath. Let v aw and v w b be the vertices at which 
such arrowheads occur that are closest to w on the respective subpaths. There 
are now three cases: (1) If w ^ v w b then tt(w, v w f,) is a directed path from w to 
v w t,. It further follows that v w b is a collider on tt, hence an ancestor of C, since 
the path is not blocked. Hence w G an(C). (2) If to ^ v aw then a symmetric 



[C U W]g = {H} U [{C UW)\ H]g 
= {H}U[C\(H\W)]g, 
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argument holds. (3) If v aw = w = v w b then w is a collider on ir, hence again an 
ancestor of C. □ 



Lemma A. 8. Let < be an (A, -<*)- consistent topological ordering for some an- 
cestral set A, and let h € H € [A]g. Then 

B = (dis prc<(ft) (/i) \ {h}) U pa g (dis prc<w (/i)) 

is the Markov blanket for h in the set C = (H n pre < (/i)) U tailg(-fT). 

Proof. Let 7r be a path from h to some u € C \ B, and assume without loss of 
generality that 7r does not intersect C\B other than at v. We will show that tt 
is blocked. 

Note that C C pre < (/i); thus if 7r includes any vertex w > h, then tt is blocked by 
Lemma lA.7[ because w is not an ancestor of any element of C. In particular, the 
edge on tt adjacent to h is of the form h -B> or h <(— . 

Now we claim that tt contains at least one non-collider; if not then it is of the 
form 



h v, h O • • • v, or /i ^— u, 



but each of these implies that v € B. However, it follows by the same argument 
that the non-collider on tt which is nearest to h must also be in B, and that 
therefore tt is blocked. □ 



Proof of Theorem \4- 12 , We proceed by induction. Clearly the result holds if \ A\ < 
1. 

If \A\ > 1 then let w 6 barreng(^4); thus A' = A \ {w} is also ancestral. By 
elementary laws of probability and the induction hypothesis, 

/aO^a) = fw\A'( x w | XA>) ■ fA'{xA') 

= fw\A'{x w | X A >) Y\ fH'\T'{xH' | X T '). 
H'e[A']g 

Let < be a topological ordering which is both (A,~<*) and (A', ^*)-consistent; 
by Corollary IA.6I such an ordering exists. For v £ A define H v = {h £ H (~l 
pre < (u) | v G H £ [A]g} and H' v similarly for A'. We can decompose the factors 
in the product into univariate marginals using the standard laws of probability as 

fH'\T'{xH' | XT') = fv\H' v VjT'{x v \ x H ' v YJT')- 

veH' 
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Letting v £ H E [A]g, by Lemma [A.8l the Markov blankets of v in the sets H V UT 
and H' v U T" are the same, and so 

fv\w v yjT'{xv I x H > v ur>) = f v \H v ur(xv I x ff „ur). 

Now let H* be the head such that w £ iJ* € [A]g; since tu € barreng(^4), it is 
m-separated from A' \ (H*UT*) by H*UT*, and thus the global Markov property 
gives 

fw\A'i x w \ xa') = fw\H*UT*( x w \ XH*UT*)- 

Thus 

Ia{xa) = \\ fv\H v UT( x v | X HvUT ) 
v£A 

= ]J fH\T(.XH | X T ) 
H&\A] g 

as required. □ 
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